OCPBUGS-83580: Load fuse kernel module before testing /dev/fuse in CRI-O#31044
OCPBUGS-83580: Load fuse kernel module before testing /dev/fuse in CRI-O#31044Chandan9112 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@Chandan9112: This pull request references Jira Issue OCPBUGS-83580, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (cmaurya@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Chandan9112 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThe dev-fuse e2e test now selects worker nodes, asserts at least one exists, queries CRI-O’s Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant KubernetesAPI
participant WorkerNode
participant CRIO
TestRunner->>KubernetesAPI: List nodes (label=worker)
KubernetesAPI-->>TestRunner: Node list
TestRunner->>WorkerNode: Exec command to read CRI-O config (e.g., cat /etc/crio/crio.conf)
WorkerNode->>CRIO: Access CRI-O config
CRIO-->>WorkerNode: default_runtime value
WorkerNode-->>TestRunner: command output
alt default_runtime contains "runc"
TestRunner->>TestRunner: Skip test
else
TestRunner->>TestRunner: Create namespace (reuse err variable)
TestRunner->>WorkerNode: modprobe fuse (for each worker)
TestRunner->>TestRunner: Continue test setup
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@Chandan9112: This pull request references Jira Issue OCPBUGS-83580, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (cmaurya@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 115-120: The current logic only loads the fuse module on the first
worker fetched (jsonpath {.items[0]}) which fails when the test pod schedules to
another worker; change the node discovery to fetch all worker names (e.g.,
jsonpath {.items[*].metadata.name}), split the output into individual node
names, then loop over each node and call nodeutils.ExecOnNodeWithChroot(oc,
strings.TrimSpace(nodeName), "modprobe", "fuse"), checking
o.Expect(err).NotTo(o.HaveOccurred()) for each invocation so every schedulable
worker has /dev/fuse loaded before the pod runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 2913005d-20d0-4383-82e0-aba7fc8eb55e
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
aefb892 to
28fc03c
Compare
|
/retest |
|
/test images |
1 similar comment
|
/test images |
|
Scheduling required tests: |
|
/retest |
|
@Chandan9112: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload 4.22 nightly blocking |
|
@BhargaviGudi: trigger 13 job(s) of type blocking for the nightly release of OCP 4.22
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/38210840-3e0a-11f1-8a06-a9e051c3da43-0 |
|
/retest |
|
Job Failure Risk Analysis for sha: 28fc03c
|
| workers := strings.Fields(nodeNames) | ||
| o.Expect(workers).NotTo(o.BeEmpty(), "No worker nodes found") | ||
| for _, worker := range workers { | ||
| _, err = nodeutils.ExecOnNodeWithChroot(oc, worker, "modprobe", "fuse") |
There was a problem hiding this comment.
This line changes the state of the node permanently by adding fuse. So all tests following this is impacted.
So we either need to unload fuse if its loaded here after the test case or run this test on a new node and destroy it.
You could try modprobe -r fuse in a defer, but this could fail if something is using it
28fc03c to
1ca4217
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 115-123: The runtime-based skip removed the prior host-side
fuse-module precondition and can allow tests to proceed without /dev/fuse
present; restore the fuse check or add an explicit /dev/fuse verification on the
worker before continuing. Specifically, before the runtime check that uses
workerName and nodeutils.ExecOnNodeWithChroot, either reinsert the
modprobe/ensure-fuse step that was originally run on the host or run a command
via nodeutils.ExecOnNodeWithChroot (using strings.TrimSpace(workerName)) to test
for the presence of /dev/fuse (e.g., test -e /dev/fuse) and fail/skip if
missing; ensure this check executes unconditionally prior to creating pods so
the original fuse-module precondition is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 351c77c3-95a9-4ec3-ae03-a1d06e555048
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
| g.By("Check if the default CRI-O runtime is runc") | ||
| workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found") | ||
| runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'") | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| if strings.Contains(runtime, "runc") { | ||
| g.Skip("Skipping: not applicable to runc runtime") | ||
| } |
There was a problem hiding this comment.
The runtime skip doesn’t replace the fuse-module precondition.
This still goes straight to pod setup without any host-side modprobe fuse check, so the original /dev/fuse flake can still happen on non-runc workers. Please keep the fuse-module step before this logic, or verify /dev/fuse exists on the worker before continuing.
Suggested fix
g.By("Check if the default CRI-O runtime is runc")
workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found")
runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'")
o.Expect(err).NotTo(o.HaveOccurred())
if strings.Contains(runtime, "runc") {
g.Skip("Skipping: not applicable to runc runtime")
}
+
+g.By("Ensure fuse kernel module is loaded before creating the pod")
+_, err = nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "modprobe fuse")
+o.Expect(err).NotTo(o.HaveOccurred())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| g.By("Check if the default CRI-O runtime is runc") | |
| workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output() | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found") | |
| runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'") | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| if strings.Contains(runtime, "runc") { | |
| g.Skip("Skipping: not applicable to runc runtime") | |
| } | |
| g.By("Check if the default CRI-O runtime is runc") | |
| workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output() | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found") | |
| runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'") | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| if strings.Contains(runtime, "runc") { | |
| g.Skip("Skipping: not applicable to runc runtime") | |
| } | |
| g.By("Ensure fuse kernel module is loaded before creating the pod") | |
| _, err = nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "modprobe fuse") | |
| o.Expect(err).NotTo(o.HaveOccurred()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/node/node_e2e/node.go` around lines 115 - 123, The
runtime-based skip removed the prior host-side fuse-module precondition and can
allow tests to proceed without /dev/fuse present; restore the fuse check or add
an explicit /dev/fuse verification on the worker before continuing.
Specifically, before the runtime check that uses workerName and
nodeutils.ExecOnNodeWithChroot, either reinsert the modprobe/ensure-fuse step
that was originally run on the host or run a command via
nodeutils.ExecOnNodeWithChroot (using strings.TrimSpace(workerName)) to test for
the presence of /dev/fuse (e.g., test -e /dev/fuse) and fail/skip if missing;
ensure this check executes unconditionally prior to creating pods so the
original fuse-module precondition is enforced.
There was a problem hiding this comment.
All failures were on clusters using runc as the default CRI-O runtime, which does not include io.kubernetes.cri-o.Devices in its allowed_annotations. On crun clusters, /dev/fuse has always been available and the test has never failed.
There was a problem hiding this comment.
@Chandan9112, thank you for the detailed clarification. That makes sense — since io.kubernetes.cri-o.Devices is not listed in allowed_annotations for runc-based clusters, CRI-O would silently ignore the device annotation regardless of whether /dev/fuse exists on the host, making the test meaningless and prone to false positives/failures in that environment. Skipping early on runc is the correct fix. I'll withdraw my concern.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
|
Scheduling required tests: |
Bug
OCPBUGS-83580
Follow-up PR
#PR
Problem
The OCP-70987 test (Allow dev fuse by default in CRI-O) was failing across multiple CI platforms (AWS, GCP, vSphere, Metal+IPv6) with error: stat: cannot statx '/dev/fuse': No such file or directory
The test relies on the CRI-O annotation io.kubernetes.cri-o.Devices: "/dev/fuse" to mount /dev/fuse into the pod. However, this annotation is only supported by the crun runtime. When the default CRI-O runtime is runc, it does not include io.kubernetes.cri-o.Devices in its allowed_annotations, so CRI-O silently ignores the annotation and the pod starts without /dev/fuse mounted.
Root Cause
The test did not account for clusters using runc as the default CRI-O runtime. The /dev/fuse device annotation is a crun-specific feature, and running the test on runc nodes will always fail.
Fix
Added a runtime check before test execution: query crio config on a worker node to detect the default runtime
Skip the test with a clear message when runc is the default runtime, since the /dev/fuse device annotation is not applicable
Replaced the previous modprobe fuse workaround with this more targeted skip
Testing
Tested on a fresh OCP 4.22 GCP cluster (nightly 4.22.0-0.nightly-2026-04-26-170224):
crun (default): Test passed — /dev/fuse successfully mounted inside the pod
runc (via ContainerRuntimeConfig): Test correctly skipped with message: Skipping: not applicable to runc runtime
Summary by CodeRabbit